-
Notifications
You must be signed in to change notification settings - Fork 114
[EP] config rdma #619
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[EP] config rdma #619
Conversation
… into ep-efa-stability
[EP] EFA stability test
| // Add self-ranks, sub other ranks | ||
| if (thread_id < kNumRanks) { | ||
| atomicAdd_system(barrier_signal_ptrs[rank] + thread_id, FINISHED_SUM_TAG); | ||
| memory_fence(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the memory_fence() needed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I followed rocm/deepep https://github.com/ROCm/DeepEP/blob/main/csrc/kernels/utils.cuh#L792
…ebug-amd-mem-consistency-yang
…ebug-amd-mem-consistency-yang
| perror("Failed to query device attributes"); | ||
| exit(1); | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried this and
attr.max_dest_rd_atomic = dev_attr.max_qp_init_rd_atom;
It gets really slow on AMD platform.
| } | ||
| #if defined(__HIP_PLATFORM_AMD__) || defined(__HIPCC__) | ||
| memory_fence(); | ||
| #endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/ROCm/DeepEP/blob/main/csrc/kernels/internode.cu#L906-L907
The ROCm one doesn't seem to have memory_fence
Description
Please include a summary of the changes and the related issue.
Fixes # (issue)
Type of Change
How Has This Been Tested?
Include any tests here.
Checklist
format.sh.build_and_install.shto verify compilation.